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DETAILED ACTION 
Response to Amendment 

1 . In response to the office action from 1/24/2005, the applicant has submitted a request for 
continued examination, filed 2/16/2005, amending Claims 1, 1 1, and 27-28, while arguing to 
traverse the art rejection based on the limitation regarding identifying speech and noise attributes 
for two speech data portions for use in speaker recognition (Amendment, Page 10). The 
applicants arguments have been fully considered but are moot with respect to the new grounds 
of rejection in view of Tzirkel-Hancock (U.S. Patent: 5,960,395). 

2. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the 
basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public use or on 
sale in this country, more than one year prior to the date of application for patent in the United States. 

3. Claims 1, 11, and 27 are rejected under 35 U.S.C. 102(b) as being anticipated by 
Tzirkel-Hancock (I/. S. Patent: 5,960,395). 

With respect to Claims 1, 11, and 27, Tzirkel-Hancock discloses: 
Receiving, for speaker recognition, target speech data (Col. 16, Lines 11-35, and method 
use in speaker dependent recognition, Col. 1, Lines 15-25); 

Selecting a pair of distinct portions of said speech data (frame pairs, Col 16, Lines 36- 

50); 
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Identifying, for each portion primarily signal attributes and primarily noise attributes 
(identifying speech and noise frames (Col. 13, Lines 37-41); 

Deriving a distance measure for one signal portion by using the primarily signal attributes 
of both signal portions (distance between two speech frame pairs, Col 16, Lines 36-60). 

4. Claims 6-8 and 16-18 are rejected under 35 U.S.C 102(b) as being anticipated by 
Yamaguchi et al (U.S. Patent: 6,026,359). 

With respect to Claim 6, Yamaguchi discloses: 

Extracting from a noisy speech signal an utterance, said noisy speech signal including a 
first portion with first signal-and-noise attributes and a second portion with second signal-and- 
noise attributes, wherein said utterance extracted from the noisy speech signal based on a first 
model trained on training speech data (Col. 12, Lines 26-31; and average speech and noise 
spectrums and the entire speech and noise spectrums, Col. 12, Lines 14-46); 

Selectively combining across the noisy speech signal the first and second signal-and- 
noise attributes of both the first and second portions to derive a compensation term for the first 
model (Fig. 5, Elements 9-10). 

Deriving a second model by compensating the first model based on the compensation 
term (Fig. 5, Element 10); and 

Correcting a mismatch indicative of a noise differential between the first portion and the 
second portion based on the second model (Fig. 5, Element 11). 

With respect to Claim 7, Yamaguchi recites: 
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A speech model adaptation method, including using a parallel model combination mechanism to 
determine said mismatch as a function of the compensation term, said first model based on a 
plurality of recognition models including at least one speech model and at least one noise model 
(updating (compensating) a noisy speech HMM, Fig. 3, Element 10, in response to a noise level 
difference between input and training speech, Element 9, and Col 11, Lines 45-52. The noisy 
speech HMM is comprised of a combination of a clean speech HMM and a noise HMM, Element 
5, a combination that is well known in the art as parallel model combination, Col. 1, Lines 53- 
55). 

With respect to Claim 8, Yamaguchi discloses: 

A speech model adaptation method, including training the at least one speech model and 
the at least one noise model with the training speech data (speech and noise models comprised of 
training data, Col 5, Lines 21-27). 

Claim 16 contains subject matter similar to Claim 6, and thus, is rejected for similar 
reasons. 

Yamaguchi further discloses model adaptation system and method use with a computer 
readable medium (Col 16, Line 58- Col 17, Line 6). 

Claim 17 contains subject matter similar to Claim 7, and thus, is rejected for similar 
reasons. 

Claim 18 contains subject matter similar to Claim 8, and thus, is rejected for similar 
reasons. 
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Claim Rejections - 35 USC § 103 

5. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

6. Claims 2-4, 12-14, and 28 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Tzirkel-Hancock in view of Porter (U.S. Patent: 4,933,973). 

With respect to Claim 2 and 12, Tzirkel-Hancock teaches the method for distance 
comparison between speech frame pairs, as applied to Claim 1 . Although Tzirkel-Hancock 
teaches the identification of speech and noise portions, a means of computing a relative noise 
measure for noise within a speech frame by distributing the speech signals over two speech 
signal frames is not taught by the prior art of record, however Porter teaches the averaging of a 
speech frame pair and the subsequent calculation of an average noise level (Col. 10, Lines 30- 
51). 

Tzirkel-Hancock and Porter are analogous art because they are from a similar field of 
endeavor in speech recognition systems. Thus, it would have been obvious to a person of 
ordinary skill in the art, at the time of invention, to modify the teachings of Tzirkel-Hancock 
with the averaging of a speech frame pair and the subsequent calculation of an average noise 
level to provide necessary pre-processing for subsequent noise compensation to implement more 
accurate speech recognition in the presence of noise (Porter, Col. 16, Lines 24-29). 

With respect to Claims 3, 13, and 28, Porter additionally recites: 
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Combining the signal attributes of the at least two signal portions into a signal content 
and combining the signal and noise attributes of the at least two signal portions into a signal and 
noise content (relative energy and speech and noise level tracker, Col 10, Line SO- Col 11, Line 
4; and Fig, 2, Elements 25-26). 

Calculating a compensation ratio of the signal and noise content to the signal content in 
order to derive the relative noise measure (signal to noise ratio, Col. 8, Lines 11-18); and 

Adjusting a mismatch indicative of a noise differential between the noise components 
present in the training speech data and the noise attributes present in the at least two signal 
portions based on the relative noise measure (modifying training speech data, Col 8, Lines 19- 
22). 

With respect to Claims 4 and 14, Porter further recites: 

Deriving from a training template, a signal profile based on a model trained on the 
training speech data to determine the mismatch between the noise components and the noise 
attributes (training template, Col. 7, Lines 59-68). 

7. Claims 5 and 15 are rejected under 35 U.S.C. 103(a) as being unpatentable over Tzirkel- 
Hancock in view of Porter, and further in view of Yamaguchi et al (U.S. Patent: 6,026,359) 

With respect to Claims 5 and 15, Tzirkel Hancock in view of Porter teaches the method 
and system for speech model compensation according to a noise level, as applied to Claim 3. 
Tzirkel-Hancock in view of Porter does not specifically suggest the use of parallel model 
combination, however Yamaguchi discloses such a method ((noisy speech HMM is comprised of 
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a combination of a clean speech HMM and a noise HMM t Element 5, a combination that is well 
known in the art as parallel model combination, Col 7, Lines 53-55). 

Tzirkel-Hancock, Porter, and Yamaguchi are analogous art because they are from a 
similar field of endeavor in speech recognition systems. Thus, it would have been obvious to a 
person of ordinary skill in the art, at the time of invention, to modify the teachings of Tzirkel- 
Hancock in view of Porter with the use of parallel model combination as taught by Yamaguchi in 
order to provide an efficient means of quickly adapting recognition models to changing 
background noise to improve speech recognition accuracy (Yamaguchi, Col 2, Lines 20-28). 

8. Claims 9, 10, 19, 20, and 22 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Yamaguchi et al. 

With respect to Claim 9, Yamaguchi teaches the speech model adaptation method 
featuring means to determine and compensate for a mismatch between noise levels of a speech 
input and training HMM. Yamaguchi does not specifically suggest generating absolute scores 
for speech and noise attributes of a noisy speech signal, however, the examiner takes official 
notice that it is well known in the art to calculate the absolute value (actual amount of difference 
between noise and speech, whether speech exceeds noise or vice versa) of speech and noise 
attributes in order to determine an absolute difference amount between speech and noise for 
comparison to a training HMM, to determine a compensation amount to account for the noise 
level difference between a training HMM and input speech. Thus, in order to determine an 
absolute amount of noise differential to be compared with an initial noise model to further 
calculate a mismatch compensation, it would have been obvious to one of ordinary skill in the art 
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at the time of invention to calculate absolute scores to describe speech and noise portions of a 
speech signal. 

With respect to Claim 10, Yamaguchi further recites: 

A speech model adaptation method of Claim [9], wherein combining further includes: 
Normalizing the absolute scores to generate normalized absolute scores for the first and 
second signal- and-noise attributes of both the first and second portions of the noisy speech signal 
{calculating the average spectrum SNR to determine an error amount (compensation), Col 12, 
Lines 32-46)\ and 

Calculating the compensation term from the normalized absolute scores (calculating the 
average spectrum SNR to determine an error amount (compensation), Col. 12, Lines 32-46). 

It would have been obvious to one of ordinary skill in the art, at the time of invention, 
that an SNR, a well-known factor in the calculation of noise compensation, of input speech 
would function as the normalized value since it represents signal level with respect to the noise 
level of a noisy speech signal. 

Claim 19 contains subject matter similar to Claim 9, and thus, is rejected for similar 
reasons. 

Claim 20 contains subject matter similar to Claim 10, and thus, is rejected for similar 
reasons. 

With respect to Claim 22, Yamaguchi discloses: 

Using a training template including a plurality of frames each frame including one or 
more channels each channel including first segments with lower signal-to-noise portions and 
second segments with higher signal-to-noise portions; and compensate the model for the 
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mismatch in the utterance and the training template based on the compensation term by counting 
over all the frames of the plurality of frames both the first segments with lower signal-to-noise 
portions and the second segments with higher signal-to-noise portions in the utterance of the 
noisy speech signal (calculating the average spectrum SNR to determine an error amount 
(compensation), Col 12, Lines 32-46). 

It would have been obvious to one of ordinary skill in the art, at the time of invention, 
that the calculation of an average spectrum SNR would function as counting the number of 
frames with lower and higher SNRs since both determine the overall difference between training 
HMMs and input speech. 

9. Claim 21 is rejected under 35 U.S.C. 103(a) as being unpatentable over Yamaguchi in 
view of Kanevsky et al (U.S. Patent: 5,897,616). 

With respect to Claim 21, Yamaguchi teaches the noise model adaptation device as 
applied to Claim 20. Yamaguchi does not teach model adaptation for use in a speaker 
verification and recognition application, however Kanevsky discloses: . 

The model adaptation device, further storing instructions that enable the processor-based 
system to: 

Compare the normalized absolute scores with a threshold associated with a speech profile 
to verify a speaker of the utterance against the speech profile (compare a speaker score to a 
threshold to implement speaker verification, Abstract); and 

Compare the normalized absolute scores with a database including a plurality of speech 
profiles associated with one or more registered speakers to identify the speaker of the utterance 
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against the database (identification of a speaker through information contained in a user 
database, Abstract, and Fig. 2, Element 18). 

Yamaguchi and Kanevsky are analogous art because they are from a similar field of 
endeavor in speech recognition. Thus, it would have been obvious to a person of ordinary skill 
in the art, at the time of invention, to combine the means of speaker verification and recognition 
through threshold comparison with user data contained in a database as taught by Kanevsky with 
the noise model adaptation device as taught by Yamaguchi in order to increase the speaker 
recognition accuracy in a variable noisy environment that causes lower recognition accuracy 
(Yamaguchi, Col. 7, Lines 48-52). Therefore, it would have been obvious to combine Kanevsky 
with Yamaguchi for the benefit of obtaining higher recognition accuracy in a noisy speech 
environment through model adaptation, to obtain the invention as specified in Claim 21 . 

10. Claims 29-30 is rejected under 35 U.S.C. 103(a) as being unpatentable over Yamaguchi 
in view of Eberman et al (U.S. Patent: 5,924,065). 

With respect to Claim 29, Yamaguchi teaches the model adaptation method utilizing a 
storage medium as applied to Claim 6. Yamaguchi does not teach method use in a wireless 
device or in a speaker recognition system, however Eberman teaches such an embodiment (Col. 
4, Lines 26-30; Col. 8 f Lines 55-65). 

Yamaguchi and Eberman are analogous art because they are from a similar field of 
endeavor in speech model adaptation. Thus, it would have been obvious to a person of ordinary 
skill in the art, at the time of invention, to modify the teachings of Yamaguchi with the use of 
noise compensation in a wireless device or in a speaker adaptation system in order to implement 
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the model compensation method taught by Yamaguchi in order to provide a well-known use for 
the method taught by Yamaguchi to further determine the identity of an unknown speaker 
(Eberman, Col 4, Liens 24-30). 

With respect to Claim 30, Eberman teaches the cellular communication network as 
applied to Claim 29. 

Allowable Subject Matter 

1 1 . Claims 23-26 are objected to as being dependent upon a rejected base claim, but would 
be allowable if rewritten in independent form including all of the limitations of the base claim 
and the intervening claims. 

12. The following is a statement of reasons for the indication of allowable subject matter: the 
prior art does not teach: 

• With respect to Claim 23, a noise model mismatch compensation, in a device 
utilizing parallel model combination, derived from the ratio of the number of 
frames containing a high SNR and a low SNR over all of the frames. 

• Claims 24-26 contain allowable subject matter because they further limit their 
parent claims. 
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Conclusion 



13. The prior art made of record and not relied upon is considered pertinent to applicant's 
disclosure: 

Ariyoshi (U.S. Patent: 5,212, 764)- teaches a method of noise reduction that extracts 
noise and speech data for a number of data channels. 

Gong (U.S. Patent: 6,418,411)- teaches a speech recognition system utilizing speaker 
adaptation and noise compensation. 

McArthur et al (U.S. Patent: 6,473, 733)- teaches a method for two-channel noise 
suppression. 

14. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to James S. Wozniak whose telephone number is (571) 272-7632 
and email is James.Wozniak@uspto.gov. The examiner can normally be reached on Mondays- 
Fridays, 8:30-4:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Doris To can be reached at (703) 305-4827. The fax/phone number for the 
Technology Center 2600 where this application is assigned is (703) 872-9306. 

Any inquiry of a general nature or relating to the status of this application or proceeding 
should be directed to the technology center receptionist whose telephone number is (703) 306- 
0377. 




James S. Wozniak 
4/5/2005 



DAVID L. OMETZ 
PRIMARY EXAMINER 



